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Abstract. We study the linear large-n behavior of the average number of distinct 
sites S(n) visited by a random walker after n steps on a large random graph. An 
expression for the graph topology-dependent prefactor B in S{n) = Bn is proposed. 
We use generating function techniques to relate this prefactor to the graph adjacency 
matrix and then devise message-passing equations to calculate its value. Numerical 
simulations are performed to evaluate the agreement between the message passing 
predictions and random walk simulations on random graphs. Scaling with system size 
and average graph connectivity are also analysed. 
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1. Introduction. 

The average number of distinct sites S{n) visited by a random walker of n steps moving 
on a graph provides important information about the geometry of the coverage of vertices 
on the graph. The problem of characterizing this quantity S{n) as a function of time 
n hnds interdisciplinary applications such as in target decay P and trapping problems 
[2] in chemical reactions, in the problem of annealing of point defects in crystals [3], in 
relaxation problems in disordered systems jTj or in problems of dynamics on the internet 
Pi. Further studies have characterized the same quantity when multiple walkers are 
moving together PE]. 

The problem has been widely studied (in the limit n S> 1) in the case of d-dimensional 
lattices laiiniin] where a number of independent studies all show that for d > 3 the 
average number of distinct visited sites grows linearly in time as S{n) = n/W{d) with 
a prefactor 1/W{d) dependent on the dimension; whereas in d = 1,2 this growth is 
slower, with S{n) = ^/8n/^T and S{n) = vm/lnn, respectively. In the case of Bethe 
lattices of connectivity k the behaviour is linear again [12], with a prefactor dependent 
on the lattice connectivity S{n) = [{k — 2)/{k — l)]n. This problem has been tackled 
also in the cases of graphs different from lattices or random graphs by using the spectral 
dimension d. Under certain assumptions Siit) ~ ^ ^ d 7^ 2. The 

quantity d has been calculated for complex types of graphs such as decimable fractals, 
bundled structures, fractal trees and d-simplex. See iniiii] for an overview. Nonetheless 
the determination of the prefactor remains an open questions for these complex types 
of graphs. 

The situation where the underlying topology is a random network has only recently 
been studied; in particular it has been found that for Scale-Free graphs (SF) [151 IlB] 
(in the time regime n S> 1) one recovers the linear behaviour S{n) ~ n seen in both 
Bethe lattices and d-dimensional lattices for d > 3. However, there is very limited 
information on the prefactor B describing this linear behavior S{n) = Bn on random 
networks. Indeed all the studies referred to above are based on a scaling ansatz and 
on the analysis of numerical simulations; neither provides a theoretical framework that 
fully characterizes the prefactor B to the same extent as has been achieved for lattices. 
The difficulty in setting up a theoretical model to characterize this prefactor is due to 
the asymmetry between forward and backward steps during the walk; this asymmetry 
is induced by the random nature of the graph structure, where nodes have a number of 
neighbours (degree) that is a random quantity extracted from a probability distribution. 
In this work we combine a general generating function approach, valid also for lattices, 
with the cavity formalism m [T8] that has proved to be useful in a wide range of 
other problems in statistical physics [19]. We derive an approximate expression for the 
topology dependent prefactor B that is valid in the thermodynamic limit of large graphs, 
and for n S> 1. We develop message-passing equations to calculate its value and perform 
numerical simulations on different graph topologies. Finally we describe the behaviour 
of S{n) in three different time regimes through scaling considerations. We propose this 
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framework as an alternative tool to the standard ones used in the case of lattices. 

The paper is organized as follows: in section [2] we introduce the general model and 
the notation used to describe a random walk on random networks. Section [3] sets out 
the generating function approach to the problem. In section 0] we then adapt it to the 
particular case of random networks. Our main results are derived using message-passing 
techniques in section [5l leading to an explicit relation between the topology dependent 
prefactor and the cavity marginals. In section [ 6 ] we present and discuss the results of 
numerical simulations, including the scaling for finite graphs. We conclude in section [7] 
with a brief summary and outlook. 


2. Random walks on graphs. 


Given a random graph Q{V,S) with V = |V| nodes and E = \£\ edges, we denote the 
neighbourhood of a node i G V by di, and its degree, i.e. the number of neighbours, 
by ki = \di\. An overall characterization of the graph topology is then provided by the 
distribution of the degrees ki, which we write as P{k). 

Introducing matrix notation we define the graph adjacency matrix A as the matrix with 
entries 

_ J 1 if {i,j) eS 

1 0 otherwise 


The nonzero entries of A then indicate which pairs of nodes are connected by an edge. 
We do not consider self-loops, thus an = 0. Throughout we will assume that the graph 
is singly connected. Should the original random graph have disconnected pieces, we 
discard all except for the largest connected component. 

A random walk on a graph is a path 7 = {uq, Vi,... ,Vn\ made up of successive random 
steps between adjacent nodes Vi on the graph, starting from a given node vq G V. Steps 
are performed according to a transition probability from a node i to an adjacent node 
j given by: 


Wij 


ki 


( 2 ) 


All adjacent neighbours of i then have equal probability of being reached in a step 
starting from i. In matrix notation we define the transition matrix W as the matrix 
with entries Wij. Defining also D as the diagonal matrix with entries dijki, we have the 
relation: 


W = D-^A 


( 3 ) 


We denote the probability of reaching node j in n steps starting from node i as Gij{n). 
With these definitions, given an n-step random walk 7 = { uq , Vi,... ,Vn}, the probability 
of reaching node starting from node Vq along this path is the product: 


n 


1 

ki 


1 1 1 
k^ki^ kn-i 


( 4 ) 
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In general, in order to compute Gij{n) one has to consider all possible random walks 
connecting i to j in n steps. Using the transition matrix W we can write this probability 
as: 

G,,{n) = [W%, = [iD-^A)% . ( 5 ) 

3. Average number of distinct sites: general results. 

We are interested in finding the average number of distinct sites Si{n) visited by a 
random walker taking n steps on a graph starting at node i. 

In this section we derive general results that are valid for any graph topology, including 
in particular the case of d-dimensional lattices. We use the formalism of generating 
functions, a tool that has been used to calculate Si{n) on lattices [121 IIH] as well as 
other quantities of interest in the study of random walks on networks [T^ [20l [H 1 ^ . 
We denote by Fij{n) the probability of reaching site j for the first time after n steps for 
a random walk starting at site z; note that for the case i = j we define “reaching” as 
“returning to” so that Fjj(O) = 0. We also define Hij{n) as the probability that site j 
has been visited at least once in n steps by a random walker starting at site z, and let 
qj{n) be the probability that a walker starting at site j does not return to it within n 
time steps. 

With these definitions the average number of distinct sites visited by time n (i.e. 
after n steps), starting at node z, can be written as: 

Si{n) = ( 6 ) 

iev 

Now if a node j has been visited at least once in a walk of n steps starting at node 
z, we can call the time of the final visit of the walk m < n and by definition the walk 
then never returns to j in the remaining n — m steps. 

Thus we can write the convolution: 

n 

Hij{n) = ^ Gij{m)qj{n - m) (7) 

m=0 

The generating function (or ^-transform) of a quantity f(n) is defined as f(z) = 
Yl’^=o with .2 G [0,1), and has the property that the ^-transform of a convolution 

is the product of the ^-transforms. The ^-transform of ([7]) is then 

Hij{z) = Gij{z)qj{z) (8) 

We now want to write everything in terms of Gij(z) and so need to find a relation 
linking qj(z) to Gij{z), which we do via the first passage time probability Fjj{n). The 
probability of returning to node j for the first time after exactly n steps can be written 
as: 


qj{n - 1) - qj(n) = Fjj(n) 


(9) 
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Taking the ^-transform of this expression and noting that qj{0) = 1, fjiz) = 
and Fjj{z) = we have: 


Hence: 


CO oo 

CO 

( 10 ) 

ZQjiz) - [q{z) - 1 ] = 1 - (1 - z)qj 

-(^) = Fjj{z) 

(11) 

m = 1^^ 


( 12 ) 


We now relate the generator Gjj{n) to the hrst passage time probability Fjj{n). The 
probability of arriving at node j in n steps starting at the same node j, can be seen as 
the sum of the probabilities grouped according to how often j is visited overall: we can 
reach j for the hrst time after n steps; or a hrst time at rii < n and a second time after 
another n — ni steps; or a hrst time at ni < n, a second time after another n 2 — ni 
steps and a third time after a hnal n — n 2 steps, and so on. Mathematically this can be 
written as: 

n n n 2 

Gjj{n) = Fjj{n) + ^ Fjj{ni)Fjj{n-ni)+^ ^ Fjj{ni)Fjj{n2-ni)Fjj{n-n2)+. • • (13) 

711=0 112=0 ni=0 

To make the convolution structure clearer, we have included the extreme values (e.g. 
ni = 0 and ni = n in the hrst sum) here even though - because T)j(O) = 0 - they do 
not contribute. Taking the ^-transform of both sides one sees that 

1 


- 1 + Fjj{z) + FjJz) + ... - 


Fjj{z) 


Substituting this result into ([H]) using flT^ we obtain: 

1 Gij{z) 


Fij{z) — Gij(z)- 


(14) 


(15) 


This can now be inserted into 
of distinct sites visited starting from site i: 


1-z {l-z)Gjj{z) 
i]) to give hnally the 2 ;-transform of the average number 


Si(z) = 


E 

iev 


G.jjz) 

Gjj{z) 


(16) 


One sees that the underlying quantity of central interest for our problem is Gij{z). The 
result of equation flTB]) is valid in general, i.e. regardless of the graph topology. We 
note that to understand the large n-behaviour of S'j(n) we need to consider Si{z) near 
z = 1. Specihcally, if as expected for 1/ —)■ oo we have Si(n) = Bn for large n, then the 
2 ;-transform will diverge ioi z ^ 1 as Si{z) = B/{1 — z)^. To calculate B we thus need 
to understand the behaviour of Gij(z) for z 1. 
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4. Average number of distinct sites: random graph results. 

In this section we will derive an expression for G{n), the matrix with entries Gij{n), 
where the dependence on the graph size for large graphs is explicit. Here we will for 
the first time have to restrict the type of graph: as explained below, we require that the 
eigenvalue spectrum of A has a nonzero gap. 

As we saw in section [2l in the case of random graphs we have G{n) = hh” = 
and hence G{z) = (1 — zD~^A)~^, which relates the propagator G to the 
graph topology via the adjacency matrix A. 

To transform to a symmetric matrix whose properties are simpler to understand, 


we rewrite this as 

G{z) = (17) 

in terms of the matrix 

R{z) = (1 - zD-^/'^AD-^^'^)-^ (18) 

This matrix is now clearly symmetric, and we can diagonalize it as 

R = PAP^ (19) 

where the matrix P has as columns the eigenvectors of R and A is a matrix containing 
the eigenvalues of R on the diagonal. 

In terms of the normalized adjacency matrix M = [22], one has 

R{z) = (1 - zM)-^ (20) 

In the following we use Dirac bra-ket notation [23] to denote the eigenvectors \uk) of 
M. If \uk) is one such eigenvector and Xk the corresponding eigenvalue, then 

M\uk) = Xk\uk) ( 21 ) 

and it follows that 

R{z) \uk) = (1 - zXk)~^ \uk) (22) 


In words, R{z) has the same eigenvectors \uk) as M but with corresponding eigenvalues 

1/(1 - zXk). 

From spectral graph theory ra we know that the ^-independent matrix M has 
eigenvalues all lying in the range [—1,1]. 

By direct substitution into the eigenvalue equation for M one sees that the vector 
with entries Ui^i = c\fki is an eigenvector with eigenvalue Ai = 1. The constant c is 
found from the normalization condition = 1 as c~^ = \/V (k) where 

(^) = '^jev average degree of the graph. If the graph is singly connected 

then there are no other eigenvectors with eigenvalue 1, so we can order the eigenvalues 
as 

1 = Ai > Aa > ... > Ay > -1 (23) 

(The fact that the eigenvalues lie between —1 and 1 can also be seen from the Perron- 
Frobenius theorem [211I2S], given that the entries of |mi) are all positive and Ai = 1.) 
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Splitting off the contribution from Ai, we can now write the eigenvector 
decomposition of R{z) as 

V 


R(z) = \ui) (till 


1-2 


{Uk\ — 


k=2 


zXk 


(24) 


and clearly the first term will be dominant in the limit z ^ 1 that we need to consider. 
With the shorthand 

^ 1 

C{z) = ^ \Uk) {uk\ - -(25) 


k=2 


zXk 


for the second term, we can then write 

\/kikj 1 


^ij \ ^ ) 


+ Cij{z) 


(26) 


V{k) 1-z 

From equation flTT)) we have Gij(z) = {kj/kiyRRij{z), so the analogous representation 
for G{z) reads 


G^yz) = 


kn 


’fCn(z) 

A/2 


V{k) 1-z 

We can now substitute these expressions into equation flTBD to obtain: 

He 

+ ' 


S', ( 2 ) = 


1 - 2 ^ 


^-G,yz)V{k){l-z) 


i k, _ 

I Rjyz)V{k){l - 2 ) ^ kj + Gjyz)V{k){l - 2 ) 


(27) 


(28) 


In the following we will consider first the limit 17 —)■ exo and then the limit 2 —)■ 1. 
This order of taking the two limits is important to get physical results, as we explain 
in more detail below. Note that the denominators in the two terms of fl28l) are identical 
but written in two different forms that will make the limit procedure clearer. 

The large 17-limit is simple to take in fl26ll . giving limy^oo Rjj{z) = Gjj{z). We are 
assuming implicitly here that G{z) has a well-defined limit for 17 — )■ 00 . This requires 
in particular that A 2 stays away from 1, i.e. that the spectrum of M has a nonzero gap 
1 — A 2 between the leading and first sub leading eigenvalue for 17 -^ 00 . This is generally 
true for regular [261 ET], ER [2H1 EH] and scale-free [301 EH] random graphs, but not 
for lattices, where the eigenvectors are Fourier modes whose eigenvalue approaches 1 
smoothly in the large wavelength (zero wavevector) limit. 

In the second term of (j28j) . the first term in the denominator can be neglected for 
17 —)■ 00 at fixed 2 < 1, giving 


lim ^,( 2 ) = — 


_ f__, 

2 ) y/hGjj{z) 


(29) 


Now we take the limit 2 —?■ 1, in which the second term becomes negligible compared to 
the first. With the assumption of a nonzero gap, Gjj{z) also has a finite limit for 2 —)■ 1 
so that we can define 


lim lim Ruiz) 

z^l IV^oo 


= lim Gjjiz) = Rj 


2 —)-l 


(30) 
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and get finally 

lim SAz) 

y^-oo 


1 

v{k){i-zy 


^ Ri 
j&V ^ 


(31) 


as the asymptotic behaviour for z ^ 1. 

This has exactly the 1/(1 — z)"^ divergence we were expecting, and gives us the 
prefactor of the large n-asymptote of the number of distinct sites visited: 


where 


lim Si{n) = Bn 

V —¥00 


1 V ^ kj 


(32) 

(33) 


We can make three observations. Firstly, if we had inverted the order of taking the limits 
and hxed V while taking z —>■ 1, then we would have had Rjj{z) = kj/[V{k){l — z)] to 
leading order. The second term in 0281) would have disappeared in the limit, so that 





kn 


Ai RrA^)V{k){l 





(34) 


to leading order near z = 1. This 1/(1 — z) divergence of Si(z) implies 
lim^^oo *S'j(n) = V, a result which is clear intuitively: if we keep the graph size hnite 
then in the limit of large times the random walk will cover the entire graph, i.e. visit all 
nodes at least once. 

Secondly, from equation (|3ni) we can see that the information one needs to calculate B 
resides in the quantities Cjj{z) = X)fc= 2 '^k,j /where the Ukj are the components 
of the eigenvectors \uk) of M and the \k the eigenvalues. So knowing the full spectrum 
of M and the associated eigenvector statistics would in principle solve our problem of 
determining B. While this is feasible computationally for hnite and not too large V, we 
are not aware of a method that would work in the thermodynamic limit 1 / —)■ oo. 
Thirdly, although the index i appears on the left hand side of equation ([32]), representing 
the initial node of the walk, it does not appear on the right. This means that the average 
number of distinct sites visited in the large n limit does not depend on the starting node, 
and therefore we can drop the index i from the left hand side of fl3^ . In particular, 
even for graphs with broad degree distributions such as scale-free graphs, the number 
of distinct sites visited will be the same whether we start the walk from a hub (a node 
with high degree) or a dangling end of the graph (a node with degree one) - provided 
of course n is large enough. 


5. The message-passing equations. 

From expression (l3^ we see that, for a given graph, we need to calculate the quantity 
Although we know the entries of the inverse R~^^{z) = 6ij — zaij{kikj)~^^‘^, it is not 
straightforward to characterize Rjj{z). We could hnd the value Rj either by calculating 
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where Cjj{z) = Yl'k= 2 '^k j/i^ ~ or by directly inverting the matrix 
R~^{z) = [1 — zD~^^‘^AD~^^'^]. Unfortunately both of these two methods are prohibitive 
computationally, already for individual graphs of large size V and even more so if in 
addition we want to average the results over an ensemble of random graphs. 

Our aim, then, is to find a viable alternative method that will allow us to 
characterize the value of Rjj{z), and thus calculate lim^^oothrough fl32|) and 
(133|) . We draw for this on methods that have been deployed in the calculation of 
sparse random matrix spectra [18]. That a connection to spectral problems should 
exist is suggested by the fact that zR{z) = {z~^l — up to a trivial 

rescaling, R{z) has the structure of a resolvent (with parameter z~^) for the random 
matrix and it is from such resolvents that spectral information is normally 

derived, in an approach that in the statistical physics literature goes back to at least 
Edwards and Jones [31]. Accordingly the two steps we will need to take mirror closely 
those used to find resolvents of sparse random matrices in [IH]: we first write the Rjj{z) 
as variances in a Gaussian distribution with covariance matrix R~^{z), and then exploit 
the fact that this distribution has a graphical model structure to derive cavity equations 
from which these variances can be found. 

5.1. Multivariate Gaussian representation. 

The hrst step is simple: we define a vector of random variables (xi,..., xy) and assign 
to this the zero mean Gaussian distribution 

P{x) OC = ^-x'^R-zD-y^AD-y^)xl 2 ^ 35 ^ 

The marginal distribution of any component of the vector, obtained by integrating P{x) 
over all other components, is then also Gaussian: 

P{xj) OC 

with variance Vj = (x^) = Rjj{z). Our goal is now to calculate these marginal variances 
efficiently, i.e. without a full matrix inversion. 

The key property of the probability distribution fl35ll is that it can be written in 
the form 

P(x)= Yl (37) 

i6V (ij)e£- 

As this factorizes into contributions associated with the nodes and edges of the 
underlying graph, it defines what is known as a graphical model [19]. On such a 
graphical model, marginal distributions can be obtained using message-passing, or 
cavity, equations. 

5.2. Cavity equations. 

For completeness, we summarize briefly the derivation of the message-passing equations, 
also known as sum-product algorithm ng. We focus on trees, i.e. graphs that do not 
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contain any loops, where the equations are exact, and leave for later a discussion of the 
extent to which they apply also to large random graphs. Write generally (pfxi) for the 
factor in P{x) associated with node i and 'ifiji^Xi^Xj) as the interaction term between 
nodes i and j. In our case we have: 

'ipiAxuXj) = (38) 

(j)i{xi) = (39) 

To calculate the marginal distribution of Xj, we could imagine hrst removing all 
edge factors 'ijjij{xj, xf) from P{x), where i runs over all neighbours of j. The tree is now 
split into subtrees rooted at each neighbour i, and one can dehne the cavity marginal of 
i, i'i^j{xi) as the marginal that is obtained from a (suitably renormalized) probability 
distribution containing only the factors from the relevant subtree. To get the marginal 
of Xj, we now just need to reinstate the missing edge factors as well as the node factor 
at j and integrate over the values of the nodes that we have not yet marginalized over, 
namely, the neighbours v. 



One can call the quantities Vi^j{xi) messages sent from i to j, or cavity marginals: each 
message tells node j what the marginal of its neighbour i would have been if the edge 
between them had been severed. 

The cavity marginals can now be obtained from an analogous relation. To get 
Ui^j{xi), one can think of removing all edges connecting i to its neighbours / other than 
j] note that the edge connecting i to j has already been taken out in the dehnition of 
the cavity marginal. This generates independent subtrees rooted at the neighbours /, 
and the marginals at these nodes are ui^i{xi). Reinstating removed edge factors and 
marginalizing over neighbours then yields 

Vi^j{xi) (X. (t)i{xi) ^ dxifjii{xi,xi)ui^i{xi) (41) 

ledi\j 

On a tree these equations can be solved by e.g. starting at leaf nodes, where simply 
Vi^j[xi) oc 4>i{xi), and then sweeping through the tree in a way that calculates each 
message once messages have been received from all neighbours except the intended 
recipient of the message. Note that two messages are needed per edge, one in each 
direction. Once all messages have been found, the marginals can be deduced from (14(11) . 

On graphs with loops, the message-passing equations fl40l) and (14T]) are no longer 
exact: when we remove all edges around node, its neighbours may then still be correlated 
because of loops, and we cannot factorize their joint distribution into a product of cavity 
marginals. The cavity method, also known as Bethe-Peierls approximation [19] , consists 
in neglecting such correlations. The set of equations (14T1) for the cavity marginals is then 
viewed as a set of hxed point equations that typically have to be iterated to convergence 
(see below). Clearly the marginals we deduce in the end are approximate. Nevertheless 
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the method remains useful for us because we expect the approximation to become exact 
for random graphs in the limit of large V. The reason is that typical loop lengths 
diverge (logarithmically) with V, so that the graphs become locally tree-like [IHl |32]. 
The correlations that the cavity method ignores then weaken as V grows, making the 
approach exact for large V. 

Specializing now to our Gaussian graphical model, the cavity marginals must also 
be Gaussian and we can write them as 




(42) 


which defines the cavity variances Inserting fl3^ and fl38|) into the general message 
passing equation flTTl) and carrying out the resulting Gaussian integrals gives then 

-1 


yp') = 


h\ki-z^ i 




ledi\j 


h 


(43) 


while for the full marginals one obtains analogously 


,(i)' 


-1 


Vi = ki ki — z' 


k 


iedj 


(44) 


These two relations are the direct analogues of Eqs. (11) and (12) in [T8] . 

The variances Vj, when calculated in the limit z ^ 1, are the quantity of interest 
for our problem as Vj = {x^) = Rj. They are known once the cavity variances have been 
obtained by solving (|43|) . 

In practice we use the rescaled cavity variances 


y: 


(j) 




ki 


(45) 


as messages from node i to node j. With this definition and using fl43|) for z ^ 1 the 
cavity equations are: 


mi^j = \ki- 

\ l&di\j 

We solve these by iteration according to 


(46) 


m. 


(t+i) 

i^j 


= ki 


E 

l&di\j 


m 


it) 


(47) 


where t represents a discrete iteration time step. 

Starting from a given graph Q, a suitably chosen convergence criterion and a 
maximum iteration time Tnav , the algorithm then works as following: 

(i) Initialize the messages rnf]^^ randomly. 
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(ii) Run through all edges {ij) and hud for each the updated messages '^j^i 

from (H71l . 

(hi) Increase t by one. 

(iv) Repeat steps 2 and 3 until either convergence is reached or t = Tj^ax- 


If convergence is reached, i.e. the preset convergence criterion is satished, one can collect 
the results and calculate the variances Vj using and fH5|) : 

Vj = kj [kj-Y^ j (48) 

V iedj J 

where are the converged messages. 

If we identify Vj = {x^) = Rj we can then also express directly the prefactor (l3^ 
in the linear asymptote in the number of distinct sites visited, S{n) = Bn, as 



(49) 

(50) 


There is one subtlety here that we have glossed over: the variances Vj are the full 
marginal variances Rjj{z), which from (1261) have the form kj/[V{k){l — z)] + Cjj{z). In 
the calculation of B we need Rj = limz^iCjj{z), where the contribution cx (1 — z)~^ 
has been removed. Where we have taken the limit z ^ 1 above, we therefore implicitly 
mean that 1 — z needs to lie in the range 1/V 1 — 2 ; 1 where the divergent 

contribution to Rjj{z) is still small enough to be neglected compared to Cjj{z). That 
it is then allowable nevertheless to set z = 1 directly in the cavity equations that we 
solve is something that has to be checked numerically: we do indeed always hnd hnite 
marginals Vj from converged solutions for the cavity marginals. The divergent solution 
also exists as a separate hxed point, namely the trivial solution mi^j = 1 of (l46|l . but 
is not accessed in our iterative solution method. 


5.3. Regular graph case. 

Before going on to numerical results for more general random graph ensembles, we briefly 
use the expression for the topology dependent prefactor fl50|) to consider the particular 
case of a regular graph, i.e. a graph where Vi G V we have R = k. In the inhnite graph 
size limit the graph is then effectively (up to negligible long loops) a regular tree, where 
each node is equivalent to all others. The quantities of interest in (|46D . (I48|) must then 
be the same Vi G V: we can write R = k, mi^j = m and Vj = v. The hxed 

point cavity equations flT6l) thus reduce to: 

/ 

m = \k— ml = [k — {k — l)m] ^ 
y l&di\j j 
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We obtain a second order equation in m: 

wf{k — 1 ) — rnk + 1 = 0 


(51) 


with solutions m = l/(fc — 1) or m = 1. The first solution is the one we require; the 
second one is the trivial solution discussed above that gives divergent variances in fl48|) . 
From m = l/(fc — 1) one can find the cavity variances and from there the full variances 


V = 


k-1 

k-2 


(52) 


Substituting into the expressions fHOl) for the prefactor B we obtain: 


B 


k-2 

k-l 


(53) 


This result agrees with the one derived for Bethe lattices of connectivity k [12]. This is 
as expected, given that the cavity method is exact on tree graphs. 


6. Simulations. 

We performed numerical simulations to test the predictions from our cavity approach 
for the number of distinct sites visited. We used four types of graph structures: 
regular random graphs (Reg), Erdos-Renyi (ER) [33], scale-free (SF) using a preferential 
attachment scheme [16] and a dedicated graph ensemble (RER) where graphs are built 
starting from a /co-regular random graph, with edges then added independently with 
probability p as in the ER model; if d = pV then the final average degree of such a 
graph is {k) = ko + d for large V. This graph ensemble thus interpolates between the 
regular and ER cases and is similar to the one analyzed in [3U[T5] with the difference that 
here we start from a regular graph instead of a ring or a lattice. As for the preferential 
attachment we used the following procedure: start with a graph of mo vertices and 
introduce sequentially V — mo new vertices by attaching each of them to m already 
existing nodes. The probability to pick a certain node i as one of these m neighbors 
is proportional to its degree, P{ki) ~ kp, thus high degree nodes will be more likely 
to be picked and hence they will increase their degree while the graph grows. These 
scheme leads to a power-law degree distribution P{k) ~ k~'^ with 7 = 2.9 ± 0.1 [T 6 ] : 
we empirically observe this value in our simulations. We also tried other generation 
methods for scale-free graphs that yield different values of 7 , but as the results were 
qualitatively similar to those for preferential attachment we only show the latter as 
representative for our scale-free graph simulations. 

For each of these graph topologies we investigated three hxed sizes V = 10^, 10^, 10^ 
and different average degrees. For ER graphs we only used the giant connected 
component of each graph sampled, but the average degrees we consider are large enough 
{{k) > 4) for this to reduce V by at most by 2%. The other types of graph have 
only one connected component by construction. For each given graph we evaluated 
the cavity prediction fl50|) from a converged solution of the cavity equations fl46|) . 
The iterative solution using (1471) converged quickly, in typically around 10 iteration 
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steps. We used as convergence criterion the following: convergence is reached if 
max(jj)g£: < e for y consecutive times, where we set |/ = 10 and e = 10“^. 

The results for B were averaged over 1, 000 different graph instances for V = 10^, 10^ 
and 100 instances for the bigger graphs of size V = 10^. 

The cavity predictions were compared against direct simulations of unbiased 
random walks. Each walk starts at a randomly picked vertex and we keep track of the 
number of distinct visited sites as the walk progresses, with individual steps performed 
using the transition probabilities Wij = ^ dehned in section O We averaged the results 
over the same graph instances as used to generate the cavity predictions. Note that 
for each instance of a given graph type, only a single walk was performed starting from 
a randomly chosen initial site. Note that while the cavity prediction depends only on 
the topology of each graph, for the direct simulations there is an additional source of 
randomness arising from the particular random walk trajectory that is obtained on a 
given graph. 

The issue of how the cavity predictions depend on graph size V deserves a brief 
comment. We argued that the method should become exact in the limit E —)■ cxo, and 
so a priori should extrapolate our predictions for B to this limit. We found, however, 
that for our relatively large graph sizes the predictions for different V agreed within the 
error bars. Thus we did not perform a systematic extrapolation and simply used the 
predictions for V = 10^, as the largest graph size for which we could obtain a statistically 
large sample (1000 graph instances) of data. The fact that already V = 10^, our smallest 
size, is large enough to obtain results that are essentially indistinguishable from those 
for 1 / —)■ cxD is consistent with findings from cavity predictions in other contexts, see 
e.g. [35l |36]. An alternative approach to evaluating the cavity predictions would have 
been to move from specific graph instances to solving the limiting (V —)■ cxo) integral 
equations for the distribution of messages across the graph. These equations can be read 
off more or less directly from the cavity equations, see e.g. [351137], or obtained from 
replica calculations [38] and then solved numerically using population dynamics. Given 
the good agreement between the predictions for our three different V this approach 
would be expected to give identical predictions, so we did not pursue it. 

6.1. Simulations versus cavity predictions. 

Our first task is to verify that the cavity equations do indeed correctly predict the 
prefactor B for random walks on large graphs. In hgure [T] we plot the average number 
of sites S{n) visited for ER graphs of degree fc = 4, 7 and 10. We plot S{n) versus Bn, 
with B the value taken from the cavity predictions, so that the data points should lie 
on the diagonal y = x ii the cavity predictions are accurate. We see in figure [T] that this 
is indeed the case, for graphs of size V = 10^. Similar levels of agreement are obtained 
for the other graph ensembles and sizes. The numerical data thus fully support our 
argument that the cavity predictions will be exact for large V, and show that in fact V 
does not have to be excessively large to reach good quantitative agreement between the 
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predictions and direct simulations. 



Figure 1: Average number of distinct sites visited, S{n) for random walks on ER graphs 
of size V = 10"^. S{n) is plotted against Bn with the prefactor B as predicted by the 
cavity method (j50l) . for different average degrees (k) = 4, 7,10 as shown in the legend. In 
the linear regime, before the random walk starts to saturate the graph, data points he on 
the diagonal, showing excellent agreement between predictions and direct simulations. 


6.2. Dependence on graph topology. 

We next look more systematically at how the prefactor B in the large n-behaviour 
S{n) = Bn depends on the topology of the graphs we study. In hgure [2] we report 
the dependence of the cavity prediction for B on average node degree (k), for the four 
different graph ensembles we studied. We found that for each graph type a hyperbolic 
£t of the form B{{k)) = gives a good description of the data, with the parameters 

Cl, C 2 dependent on the graph topology but best £t values always satisfying ci = C 2 + 1. 
Thus we could interpret the generic graph result as the one for a regular graph with 
effective degree (fc) — C 2 + 1. This is intriguing as it suggests that the effect of changing 
the average degree is quite similar between the different graph types. 

Looking at quantitative differences between graph ensembles, we observe that the 
prefactor B is smallest for given connectivity (average degree) when the graph is regular. 
Heterogeneity in the node degrees thus generically seems to increase the number of 
distinct sites a random walk will visit, a result that seems to us non-trivial and would 
be interesting to investigate as a broader conjecture: could there be a lower bound 
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B > {{k) — 2)/{{k) — 1)? If this were the case, one may wonder whether this is related 
to the spectral gap of a given graph, which is maximal for regular random graphs 
[39l 00] . Indeed the impact of the gap would appear in the numerator of the prefactor 
(l33|) through equation (l30D and by using the definition Cjj{z) = “ zX^)- 

Nonetheless the gap contribution could be balanced off by the square of the eigenvector 
entries ul^ of the matrix R which can be of order 0(1) or 0{1/V) depending on the 
eigenvector localization or delocalization respectively. For instance scale-free graphs 
have been shown empirically to be localized (when considering the adjacency matrix), i.e. 
only a few eigenvector entries are non-zero and these correspond to the high degree nodes 
[28] , whereas for ER graph the amplitude of the eigenvalue entries is evenly distributed 
among all the nodes; this difference can be detected for instance by calculating the 
inverse participation ratio [HH] EH]- In order to make a more rigorous statement one 
would then need to consider these two aspects at the same time but the absence of a 
general analytical characterization for either the eigenvalues or the eigenvector entries 
makes this difficult. 

One could also ask whether at given (/c), B is always increasing with some measure 
of spread of degrees such as the variance — (fc)^. For our admittedly limited choice 
of graph ensembles it is certainly true that the scale-free graphs (SF), which have the 
broadest degree distributions, also give the largest B. Below them are the ER graphs. 
The RER graphs, finally, with their character intermediate between regular and ER, 
also have prefactors B that lie between those of the ER and regular graphs. 

6.3. Finite-size effects and scaling. 

We can use our numerical simulation results to enquire also about hnite-size effects, 
describing the behaviour of S{n) on graphs of large but finite size V. Our derivation of 
B and its prediction using cavity techniques was done taking a large R-limit so cannot 
make statements about this regime; instead we will have to rely on physical intuition to 
construct a suitable finite-size scaling ansatz. 

From inspection of the numerical simulations, we can distinguish a number of time 
regimes. Initially S{n) is linear in n with prefactor 1. This is greater than the large 
n prediction Bn with a prefactor R < 1, because the random walker has not yet had 
much opportunity to return to previous sites; in particular one has, trivially, «S'(1) = 1, 
ignoring the starting site vq. 

For larger n one hnds the predicted linear growth with prefactor R < 1, i.e. 
S(n) = Bn. Once Bn becomes comparable to V, a crossover to sublinear growth takes 
place, and finally S(n) approaches V as the walker visits all sites for asymptotically 
large n. These regimes, with the exception of the trivial small n-range, can be clearly 
distinguished in hgure jS] which shows results for fixed graph size V = lO'^ and graphs 
with (k) = 4; plots for other graph sizes and average degrees look qualitatively identical. 

A plausible scaling ansatz that encompasses the various regimes - again without 
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Figure 2: Prefactor B predicted by cavity method as a function of average degree, for 
different graph types as shown in the legend. The lines represent hyperbolic hts; see 
text for details. Note that the results for Reg and RER are essentially on top of each 
other, and the same is true for ER and SF. 


the initial small n-piece - is 

where the limiting behaviour of the scaling function must be 



(54) 


(55) 


to reproduce S{n, V) ~ Bn and S{n, V) ~ V when n is much smaller and much larger 
than V, respectively. 

In hgure 0] we check to what extent the hnite-size scaling captures our 
simulation data. We show results for graph sizes V = 10^, 10^, 10® and two values for 
the average degree (k) = 4,10. By plotting S{n)/{Bn) vs Bn/V with B predicted from 
the cavity equations, we directly have a graphical representation of the scaling function 
/(x). Very good agreement is seen between the three different graph sizes: these all 
collapse onto the same curve, except the initial regime discussed above where S{n) ~ n 
and hence S{n)/{Bn) > 1. Beyond this we observe a plateau at S{n)/{Bn) = 1, which 
in a different guise verihes our claim above that the cavity method does indeed predict 
the prefactor B correctly. For x = Bn/V growing towards unity, the curves drop below 
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(a) 



(c) 



(b) 



(d) 


Figure 3: Finite size effects: we show the walker behavior by plotting S{n)/V, i.e. the 
fraction of distinct sites visited, derived from direct simulations vs n/V. Results are 
from averages over 1000 instances of graphs of hxed size V = 10^ and average degree 
{k) = 4, for different graph topologies: a) Regular; b) RER; c) ER; d) SF. The dashed 
red lines show the cavity predictions Bn for the linear growth with n, a regime which 
is clearer in the log-log plot insets. Beyond that one observes a slow crossover, with 
S{n)/V eventually approaching unity. Solid lines show our phenomenological scaling 
hts. 


this plateau as expected, indicating the start of the saturation regime. Asymptotically 
the scaling fnnction /(x) then approaches 1/x, reflecting the hnal satnration of S{n) at 
the npper bonnd V. 

More snrprising, and not reqnired by onr ansatz per se, is that we see in hgnre 
m good collapse also between graphs of different average degree: using Bn/V as the 
argument of the scaling function seems snfficient to absorb all the variation with (k), 
withont fnrther changes in f{x). The only exception is provided by the scale free graphs, 
which we discuss in more detail below. 

Encouraged by the good agreement of the numerical data with the ansatz fl54ll . we 
attempt to hnd simple hts to the scaling function f{x). The simulation data show that 
the crossover starts off with a roughly exponential departure from the small x-plateau 
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f{x) ~ 1, which suggests a scaling function of the form f{x) = a/ln(6 + (e“ — &)e“^), 
where a and b are fitting parameters. Figure H] shows that this form fits the data 
extremely well, and except for the scale-free graphs the hts can be performed even with 
hxed b = 1, leaving a single fit parameter. 

We comment hnally in more detail on the case of SF graphs. Here we see that 
the data in figure H] do not collapse perfectly for different V in the intermediate regime 
where x = Bn/V is order unity or somewhat smaller. In addition, the crossover in f{x) 
is slower, with f{x) lower in the crossover region than for the other three graph types. 
We conjecture that both of these effects are due to the presence of many small loops 
in SF graphs, for example triangles (loops of length 3). To support this hypothesis, we 
calculated the average number of triangles present in the different types of graph, taking 
averages over 100 graph instances of size V = 10^. We found results in the same range 
for Reg, ER and RER graphs, where the average percentage of nodes that are part of 
at least one triangle does not exceed 2%, 7% and 37% for {k) = 4, 6,10 whereas for 
SF graphs the relevant fractions of nodes reach 9%, 24% and 51% for the same average 
degrees. These results confirm that SF graphs generated via preferential attachment 
contain a higher number of short loops than the other topologies. In fact it has been 
shown by spectral arguments [28] that, even though the fraction of nodes in triangles 
will tend to zero for R —)■ oo, the growth rate of the number of loops of length I > 4 
exceeds all polynomial growth rates, thus these graphs do not become locally treelike 
for large V. Therefore it is somewhat surprising that the cavity predictions for B are 
quantitatively accurate even for SF graphs. 

7. Conclusions. 

We have presented an analytical expression for the topology dependent prefactor B 
governing the linear regime for the average number of distinct sites S{n) visited by a 
long (large n) random walk on a large random graph. We adapted the general results 
derived for S{n) in terms of generating functions, as used to study d-dimensional lattices, 
to the case of random networks. We then combined message-passing techniques and the 
properties of Gaussian multivariate distributions to derive an expression for B that 
is valid for locally tree-like graph structures, and found good agreements between the 
theoretical predictions and direct numerical simulations. An intriguing feature of the 
results is that at fixed average degree {k), B seems smallest for regular graphs and 
increases with the width of the degree distribution, and one may conjecture that the 
regular graph result R = (fc — 2)/(/c — 1) is in fact a lower bound. 

We analysed finite-size effects for S{n, V) and proposed a simple scaling ansatz to 
capture these. Apart from a trivial small n-regime, one finds a linear regime S ~ Bn 
with prefactor B in accord with our predictions; an asymptotic regime Bn ^ V where 
the random walk saturates and S' —)■ R; and a crossover in between. Our data provides 
excellent support for the scaling description, except possibly for SF graphs built via 
preferential attachment, and we were able to provide a simple two-parameter (in fact 
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(a) (b) 




(c) (d) 


Figure 4: Finite-size scaling of number of distinct sites visited, showing y = S{n)/{Bn) 
versus x = Bn/V. Data from direct simulations (symbols), with B predicted from the 
cavity equations, are shown for graphs of sizes V = 10^, 10^, 10^ and average degrees 
{k) = 4,10. The graph topologies are: (a) Regular; (b) RER; (c) ER; (d) SF. Very good 
collapse onto a master curve y = f{x) is seen between the different average degrees 
and - in (a,b,c) - also different V. The initial plateau ai y = 1 shows the agreement 
between direct simulations and cavity predictions. For larger x saturation sets in, with 
/(x) ~ 1/x asyptotically (dotted black line). 


often one-parameter) £t for the scaling function. 

The accurate results we obtained using message-passing techniques may open new 
perspectives in the analysis of random walks on networks. The cavity method we applied 
to study random walks on networks could be considered as a valid alternative tool to 
analyse other types of quantities related to this problem. For instance one could develop 
further the model by considering a set of N independent random walkers over a random 
network and studying the behavior of the average number of distinct or common visited 
sites, as has been done in the case of lattices [HIHIIII]. This could give insights into 
the occupancy statistics of packet-switched networks where packets of data move by 
independently hopping along nodes to transmit informations between users. The general 
character of our analysis suggests to us that it should be feasible to adapt it to the study 
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of this or similar types of questions that arise in the study of random walks on networks. 
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Appendices 


A. The graph representation of the Gaussian covariate distribution. 


We can rewrite the joint distribution fl3^ using (z) = 5ij — z—f==. In this way we 


/ ki kj 

can separate the node and edge contributions respectively to obtain a graphical model 
representation: 

^-x'^R-^[z)xl2 
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(56) 


B. Regular graph case. 

We calculate an exact expression for the topology dependent prefactor in the case of a 
regular graph. Using R = k, rrii^j = m, Vj = v, fl48|) and flSTj) we get: 

V = k[k — m]~^ 

k£di 

= k[k — km\~^ 
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= k 
= k 


k 1 


k-1 


-1 


k-1 


k{k-2) 

We substitute into the expressions for the prefactor B to obtain: 


1 k 

^Vk^v 

j&v 

1 Vk{k-2) 

~ Vk k-1 

_k-2 
~ k-1 


(57) 


(58) 


Therefore the large time limit of the average number of distinct sites of a random walk 
on a /c-regular graph is: 


lim S{n) 

n—)-oo 


k-2 

k-1 


n 


(59) 
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